[01:20:41] <wikibugs>	 (03PS1) 10Reedy: Collapse PHP_SAPI conditionals down into one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393355
[02:07:52] <wikibugs>	 (03PS1) 10Andrew Bogott: role::puppetmaster::standalone:  allow specification of puppet_major_version [puppet] - 10https://gerrit.wikimedia.org/r/393357
[02:08:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] role::puppetmaster::standalone:  allow specification of puppet_major_version [puppet] - 10https://gerrit.wikimedia.org/r/393357 (owner: 10Andrew Bogott)
[02:10:24] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 032 C: 032] role::puppetmaster::standalone:  allow specification of puppet_major_version [puppet] - 10https://gerrit.wikimedia.org/r/393357 (owner: 10Andrew Bogott)
[02:11:25] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: load.php requests taking multiple minutes - https://phabricator.wikimedia.org/T181315#3786326 (10Tgr) (The file had to be deleted because I messed up and made it public. See also T181317. Can provide it on request though.)
[02:11:42] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: load.php requests taking multiple minutes - https://phabricator.wikimedia.org/T181315#3786328 (10Tgr)
[02:47:42] <wikibugs>	 (03PS1) 10Andrew Bogott: puppetmaster.erb: allow switching of puppetmaster_rack_path [puppet] - 10https://gerrit.wikimedia.org/r/393358
[03:24:54] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 762.31 seconds
[03:59:05] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 178.73 seconds
[04:02:41] <wikibugs>	 (03PS13) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956)
[04:03:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) (owner: 10TerraCodes)
[04:28:19] <wikibugs>	 (03PS1) 10TerraCodes: update [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393360
[04:29:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] update [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393360 (owner: 10TerraCodes)
[04:40:46] <wikibugs>	 (03PS14) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956)
[04:41:26] <wikibugs>	 (03Abandoned) 10TerraCodes: update [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393360 (owner: 10TerraCodes)
[04:41:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) (owner: 10TerraCodes)
[04:44:40] <wikibugs>	 (03PS15) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956)
[05:18:24] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw2133 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:19:14] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw2133 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.200 second response time
[05:49:14] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3786373 (10bd808) >>! In T180854#3785110, @Qgil wrote: > Could these upgrades via UI make the requirements for maintenace simpler?  This typ...
[05:57:27] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3786376 (10bd808) >>! In T180854#3783598, @Qgil wrote: >>>! In T180854#3778284, @bd808 wrote: >> I would also recommend that the deployment...
[06:31:04] <icinga-wm>	 PROBLEM - puppet last run on mw2173 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/hphpd/hphpd.ini]
[06:56:04] <icinga-wm>	 RECOVERY - puppet last run on mw2173 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:48:54] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of JeanBono → Rexcornot: supervision needed - https://phabricator.wikimedia.org/T181170#3786384 (10alanajjar) @Marostegui online now?
[08:27:31] <wikibugs>	 (03PS1) 10ArielGlenn: abstract recombine job needs to read the gzipped input files [dumps] - 10https://gerrit.wikimedia.org/r/393364
[08:28:30] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] abstract recombine job needs to read the gzipped input files [dumps] - 10https://gerrit.wikimedia.org/r/393364 (owner: 10ArielGlenn)
[08:29:54] <icinga-wm>	 PROBLEM - puppet last run on labtestneutron2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:32:42] <logmsgbot>	 !log ariel@tin Started deploy [dumps/dumps@ec21673]: fix abstracts recombine job
[08:32:44] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@ec21673]: fix abstracts recombine job (duration: 00m 02s)
[08:32:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:32:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:54] <icinga-wm>	 RECOVERY - puppet last run on labtestneutron2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:05:53] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of JeanBono → Rexcornot: supervision needed - https://phabricator.wikimedia.org/T181170#3786442 (10Marostegui) No,  will need to wait till Monday
[10:06:42] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of JeanBono → Rexcornot: supervision needed - https://phabricator.wikimedia.org/T181170#3786443 (10alanajjar) @Marostegui okay
[10:38:42] <wikibugs>	 (03PS1) 10ArielGlenn: extend command line length for dumps status files tarball creation [puppet] - 10https://gerrit.wikimedia.org/r/393366
[10:40:09] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] extend command line length for dumps status files tarball creation [puppet] - 10https://gerrit.wikimedia.org/r/393366 (owner: 10ArielGlenn)
[11:17:44] <icinga-wm>	 PROBLEM - carbon-frontend-relay metric drops on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [100.0] %27https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1panelId=21fullscreen%27+%27https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1panelId=21fullscreen%27
[11:18:55] <icinga-wm>	 PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/nagios/plugins/check-fresh-files-in-dir.py]
[11:20:44] <icinga-wm>	 RECOVERY - carbon-frontend-relay metric drops on graphite1001 is OK: OK: Less than 80.00% above the threshold [25.0] %27https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1panelId=21fullscreen%27+%27https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1panelId=21fullscreen%27
[11:43:55] <icinga-wm>	 RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[12:18:34] <icinga-wm>	 PROBLEM - Apache HTTP on mw2103 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:19:24] <icinga-wm>	 RECOVERY - Apache HTTP on mw2103 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.120 second response time
[12:46:46] <wikibugs>	 (03Draft2) 10Jayprakash12345: Enable AdvancedSearch in Arabic Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393369
[12:47:58] <wikibugs>	 (03PS3) 10Jayprakash12345: Enable AdvancedSearch in Arabic Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393369 (https://phabricator.wikimedia.org/T180291)
[13:04:05] <icinga-wm>	 PROBLEM - carbon-frontend-relay metric drops on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [100.0] %27https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1panelId=21fullscreen%27+%27https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1panelId=21fullscreen%27
[13:05:05] <icinga-wm>	 RECOVERY - carbon-frontend-relay metric drops on graphite1001 is OK: OK: Less than 80.00% above the threshold [25.0] %27https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1panelId=21fullscreen%27+%27https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1panelId=21fullscreen%27
[13:30:24] <icinga-wm>	 PROBLEM - HHVM rendering on mw2242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:31:14] <icinga-wm>	 RECOVERY - HHVM rendering on mw2242 is OK: HTTP OK: HTTP/1.1 200 OK - 73671 bytes in 0.291 second response time
[13:37:45] <wikibugs>	 10Operations, 10Graphite: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333#3786600 (10fgiunchedi)
[13:37:54] <wikibugs>	 10Operations, 10Graphite: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333#3786612 (10fgiunchedi) p:05Triage>03Unbreak!
[13:40:32] <godog>	 !log drop incoming statsd from scb to graphite1001 temporarily - T181333
[13:40:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:40:41] <stashbot>	 T181333: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333
[13:41:02] <wikibugs>	 10Operations, 10Services, 10Graphite: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333#3786616 (10fgiunchedi)
[13:48:44] <wikibugs>	 (03PS1) 10ArielGlenn: don't preserve timestamp of dump statusfiles tarball during rsync [puppet] - 10https://gerrit.wikimedia.org/r/393372
[13:50:03] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] don't preserve timestamp of dump statusfiles tarball during rsync [puppet] - 10https://gerrit.wikimedia.org/r/393372 (owner: 10ArielGlenn)
[13:51:15] <icinga-wm>	 PROBLEM - carbon-frontend-relay metric drops on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [100.0] %27https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1panelId=21fullscreen%27+%27https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1panelId=21fullscreen%27
[13:52:15] <icinga-wm>	 RECOVERY - carbon-frontend-relay metric drops on graphite1001 is OK: OK: Less than 80.00% above the threshold [25.0] %27https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1panelId=21fullscreen%27+%27https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1panelId=21fullscreen%27
[13:58:35] <wikibugs>	 10Operations, 10Services, 10Graphite: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333#3786649 (10fgiunchedi) More spam, from kafka topic for cpjobqueue, note the repeated `retry_change-prop_retry_change-prop_retry_change-prop_retry_change-prop_retry_change-prop_retry_change-`...
[14:10:37] <godog>	 !log roll-restart cpjobqueue to alleviate metrics leak - T181333
[14:10:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:43] <stashbot>	 T181333: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333
[14:23:14] <subbu>	 https://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&c=API+application+servers+eqiad&h=&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name .. looks like ganglia has stopped recording metrics here for over 2 days ..
[14:25:55] <wikibugs>	 10Operations, 10Services, 10Graphite: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333#3786683 (10fgiunchedi) Looks like cxserver is having the same problem wrt repeated gc metrics  ``` 14:24:39.537511 IP scb1003.eqiad.wmnet.50926 > graphite1001.eqiad.wmnet.8125: UDP, length 142...
[14:26:53] <godog>	 !log restart cxserver on scb100[34] - T181333
[14:27:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:01] <stashbot>	 T181333: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333
[14:28:48] <godog>	 subbu: yeah ganglia is going away
[14:30:32] <subbu>	 ok. where is the equivalent grafana graph that for those?
[14:30:48] <subbu>	 s/that//
[14:32:31] <godog>	 subbu: https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&cluster=appserver&orgId=1
[14:34:04] <godog>	 !log rolling restart of cxserver to alleviate metrics leak - T181333
[14:34:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:10] <stashbot>	 T181333: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333
[14:35:06] <subbu>	 thx
[14:40:25] <wikibugs>	 (03PS1) 10ArielGlenn: add top level index files to dump status file tarball for rsync [puppet] - 10https://gerrit.wikimedia.org/r/393374
[14:42:58] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] add top level index files to dump status file tarball for rsync [puppet] - 10https://gerrit.wikimedia.org/r/393374 (owner: 10ArielGlenn)
[14:48:05] <icinga-wm>	 PROBLEM - Host mc2026 is DOWN: PING CRITICAL - Packet loss = 100%
[14:49:34] <icinga-wm>	 RECOVERY - Host mc2026 is UP: PING OK - Packet loss = 0%, RTA = 36.13 ms
[15:05:16] <wikibugs>	 10Operations, 10Services, 10Graphite: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333#3786699 (10fgiunchedi) Statsite and network graphs from graphite1001  {F10994913}  {F10994912}
[15:08:41] <wikibugs>	 10Operations, 10Services, 10Graphite: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333#3786700 (10fgiunchedi) Top 20 sent metrics from scb in eqiad:  ``` scb1002:~$ sudo timeout 1m ngrep -q -W byline . udp dst port 8125  | grep -v -e '^U ' -e '^$' | sed -e 's/:.*//' | sort | cut...
[15:32:32] <volans>	 !og restarted statsd-proxy on graphite1001 (died during investigation)
[15:36:49] <paladox>	 volans you missed l :)
[15:36:53] <paladox>	 L
[15:37:08] <volans>	 oh, my bad...
[15:37:12] <volans>	 !og restarted statsd-proxy on graphite1001 (died during investigation) T181333
[15:37:13] <stashbot>	 T181333: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333
[15:37:17] <volans>	 !log restarted statsd-proxy on graphite1001 (died during investigation) T181333
[15:37:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:31] * volans maybe can do it :D
[15:39:00] <volans>	 thanks for spotting it
[15:45:19] <logmsgbot>	 !log ppchelko@tin Started deploy [cpjobqueue/deploy@e35aa05]: Rollback. Disable GC metric reporting T181333
[15:45:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:45:26] <stashbot>	 T181333: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333
[15:45:50] <logmsgbot>	 !log ppchelko@tin Finished deploy [cpjobqueue/deploy@e35aa05]: Rollback. Disable GC metric reporting T181333 (duration: 00m 31s)
[15:45:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:11] <godog>	 !log unban statsd traffic from scb on graphite1001 - T181333
[16:05:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:18] <stashbot>	 T181333: cpjobqueue spamming statsd metrics - https://phabricator.wikimedia.org/T181333
[16:05:32] <logmsgbot>	 !log kartik@tin Started deploy [cxserver/deploy@11aecc9]: Update cxserver to 0c242c0, Pin service-runner to 2.4.2
[16:05:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:01] <logmsgbot>	 !log kartik@tin Finished deploy [cxserver/deploy@11aecc9]: Update cxserver to 0c242c0, Pin service-runner to 2.4.2 (duration: 03m 29s)
[16:09:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:27] <kart_>	 godog: Done. See if that helps.
[16:10:23] <godog>	 kart_: will do! looking ok so far
[16:10:42] <kart_>	 cool.
[19:22:50] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db1051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 492.42 seconds
[19:23:21] <marostegui>	 Lets see
[19:23:37] <marostegui>	 It is an vslow slave
[19:23:40] <volans|off>	 marostegui: I'm here too
[19:23:54] <marostegui>	 o/
[19:24:01] <jynus>	 \o
[19:24:03] <marostegui>	 haha
[19:24:26] <jynus>	 I am going to bet RAID problems
[19:24:36] <jynus>	 I do not see traffic shifts
[19:24:39] <_joe_>	 i am here too ftr
[19:25:21] <volans>	 OK: optimal, 1 logical, 2 physical, WriteBack policy
[19:25:30] <marostegui>	 raid looks good yeah
[19:27:00] <jynus>	 there is one disk with 2000 errors, though
[19:27:10] <jynus>	 that is the other typical RAID issue
[19:27:17] <marostegui>	 increasing?
[19:27:28] <jynus>	 bad disk not yet removed from the group
[19:27:31] <volans>	 there are few slow queries on tendril in the last few minutes, probably effect
[19:27:58] <jynus>	 yes, increasing
[19:28:07] <marostegui>	 then that disk is probably the cause
[19:28:27] <jynus>	 I would remove the disk, it is out of warranty anyway
[19:28:57] <volans>	 so just dumb raid controller that didn't detect it yet?
[19:29:19] <jynus>	 and properly documented: https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Disks_about_to_fail
[19:29:27] <volans>	 jynus: which disk?
[19:29:32] <marostegui>	 Enclosure Device ID: 32
[19:29:32] <marostegui>	 Slot Number: 3
[19:29:49] <marostegui>	 Agreed?
[19:29:51] <volans>	 interesting, the predictive failure is for slot 3 and 8
[19:30:36] <jynus>	 let's log: 32:3
[19:30:39] <volans>	 PD:1 slot 3 and PD 0 slot 8 have predictive failure counts of 4 both
[19:30:54] <marostegui>	 32:3 yes
[19:31:14] <marostegui>	 This would be it: megacli -PDOffline -PhysDrv \[32:3\] -aALL 
[19:31:23] <jynus>	 yes
[19:31:32] <jynus>	 or '[32:3]'
[19:31:37] <jynus>	 at your will
[19:31:39] <marostegui>	 !log Set 32:3 disk to offline on db1051
[19:31:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:54] <marostegui>	 done
[19:32:05] <akosiaris>	 52406 (564953513s/0x0001/CRIT) - VD 00/0 is now DEGRADED
[19:32:11] <marostegui>	 lag recovering
[19:32:13] <volans>	 lag reducing
[19:32:16] <akosiaris>	 ah nice
[19:32:32] <marostegui>	 volans: we will get a normal degraded raid task, right?
[19:32:38] <marostegui>	 so we don't have to open it :)
[19:32:41] <volans>	 marostegui: yep, give it few minutes
[19:32:45] <jynus>	 checking now: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=6&fullscreen&orgId=1&from=now-6h&to=now
[19:32:46] <marostegui>	 sure sure
[19:32:49] <volans>	 I'm forcing the recheck
[19:33:01] <jynus>	 it is going down
[19:33:28] <jynus>	 db1051 is getting old
[19:33:32] <jynus>	 and so is db1052
[19:33:34] <icinga-wm>	 PROBLEM - MegaRAID on db1051 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[19:33:36] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1051 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T181345
[19:33:40] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1051 - https://phabricator.wikimedia.org/T181345#3786844 (10ops-monitoring-bot)
[19:33:41] <marostegui>	 :)
[19:33:44] <volans>	 there you go ;)
[19:34:03] <volans>	 lag back to 0
[19:34:41] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1051 - https://phabricator.wikimedia.org/T181345#3786849 (10Marostegui) We forced this disk to OFFLINE as it was having errors and it was making db1051 lag. As soon as it was set to OFFLINE the lag started to recover @Cmjohnson can we get it replaced? Thanks!
[19:34:56] <jynus>	 it it had been trafic changes, we could have pooled it as 0
[19:35:00] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on db1051 is OK: OK slave_sql_lag Replication lag: 0.43 seconds
[19:35:05] <jynus>	 to slow down the whole wiki
[19:35:14] <jynus>	 but it was good that it was at 0
[19:35:19] <jynus>	 so it didn't affect it
[19:35:47] <jynus>	 for much, I assume semisinc kicked in for a while
[19:36:12] <jynus>	 so, nothing else to see here
[19:36:39] <jynus>	 https://www.youtube.com/watch?v=5NNOrp_83RU
[19:37:06] <volans>	 lol
[19:37:13] * volans back off
[19:38:46] <marostegui>	 I am off too