[01:56:07] <icinga-wm>	 PROBLEM - MegaRAID on db1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[01:56:08] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T183708
[01:56:12] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T183708#3861352 (10ops-monitoring-bot)
[02:06:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T183708#3861355 (10Peachey88)
[03:24:36] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 789.72 seconds
[03:50:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 116.17 seconds
[05:54:36] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp4024 is CRITICAL: CRITICAL: expiry mailbox lag is 2082652
[07:23:06] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T183708#3861434 (10Marostegui) a:03Cmjohnson Even though this server will be decommissioned (hopefully) during next Q, let's get the disk replaced when possible We should have plenty of 300G disks from the old dec...
[07:29:27] <icinga-wm>	 PROBLEM - HHVM rendering on mw2134 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:30:17] <icinga-wm>	 RECOVERY - HHVM rendering on mw2134 is OK: HTTP OK: HTTP/1.1 200 OK - 75025 bytes in 0.314 second response time
[07:41:17] <wikibugs>	 (03PS1) 10Marostegui: install_server: Allow reinstall db1113,db1114 [puppet] - 10https://gerrit.wikimedia.org/r/400268 (https://phabricator.wikimedia.org/T182896)
[07:41:22] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Rack and setup db1111 and db1112 - https://phabricator.wikimedia.org/T180788#3861442 (10Marostegui)
[07:41:55] <wikibugs>	 (03Draft2) 10Jayprakash12345: Add new namespace aliases on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400267
[07:42:21] <wikibugs>	 (03PS3) 10Jayprakash12345: Add new namespace aliases on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400267 (https://phabricator.wikimedia.org/T183711)
[07:43:26] <wikibugs>	 (03CR) 10Marostegui: [C: 032] install_server: Allow reinstall db1113,db1114 [puppet] - 10https://gerrit.wikimedia.org/r/400268 (https://phabricator.wikimedia.org/T182896) (owner: 10Marostegui)
[08:08:28] <wikibugs>	 (03CR) 10TerraCodes: [C: 031] "Wouldn't it break things like global userpages having "hi, you can contact me via my email if its private"?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/397768 (https://phabricator.wikimedia.org/T182541) (owner: 10EddieGP)
[08:15:40] <wikibugs>	 (03CR) 10Marostegui: [C: 031] mariadb: Repool db1055 & db1056 as x1 replicas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399782 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo)
[08:22:23] <wikibugs>	 (03CR) 10Marostegui: mariadb: Decommissioning proposal (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399792 (https://phabricator.wikimedia.org/T134476) (owner: 10Jcrespo)
[09:09:02] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 031] Fix linewrap issue on wikimedia error page (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/395552 (https://phabricator.wikimedia.org/T180656) (owner: 10Phantom42)
[09:13:26] <icinga-wm>	 PROBLEM - Apache HTTP on mw2125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:14:17] <icinga-wm>	 RECOVERY - Apache HTTP on mw2125 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.119 second response time
[09:23:34] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3861506 (10Marostegui)
[09:46:19] <wikibugs>	 (03CR) 10星耀晨曦: [C: 031] Add new namespace aliases on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400267 (https://phabricator.wikimedia.org/T183711) (owner: 10Jayprakash12345)
[09:59:28] <wikibugs>	 (03PS4) 10ArielGlenn: move ferm rules for nfs out from dumps module to a profile [puppet] - 10https://gerrit.wikimedia.org/r/400244
[10:00:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] move ferm rules for nfs out from dumps module to a profile [puppet] - 10https://gerrit.wikimedia.org/r/400244 (owner: 10ArielGlenn)
[10:02:39] <wikibugs>	 (03PS5) 10ArielGlenn: move ferm rules for nfs out from dumps module to a profile [puppet] - 10https://gerrit.wikimedia.org/r/400244
[10:03:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] move ferm rules for nfs out from dumps module to a profile [puppet] - 10https://gerrit.wikimedia.org/r/400244 (owner: 10ArielGlenn)
[10:04:44] <wikibugs>	 (03PS6) 10ArielGlenn: move ferm rules for nfs out from dumps module to a profile [puppet] - 10https://gerrit.wikimedia.org/r/400244
[10:06:26] <icinga-wm>	 PROBLEM - HHVM rendering on mw2138 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:07:17] <icinga-wm>	 RECOVERY - HHVM rendering on mw2138 is OK: HTTP OK: HTTP/1.1 200 OK - 75073 bytes in 0.296 second response time
[10:38:09] <wikibugs>	 (03PS7) 10ArielGlenn: move ferm rules for nfs out from dumps module to a profile [puppet] - 10https://gerrit.wikimedia.org/r/400244
[10:40:26] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] move ferm rules for nfs out from dumps module to a profile [puppet] - 10https://gerrit.wikimedia.org/r/400244 (owner: 10ArielGlenn)
[10:47:16] <wikibugs>	 (03PS1) 10ArielGlenn: don't export dumps web server filesystems to snapshots, they don't use it [puppet] - 10https://gerrit.wikimedia.org/r/400386
[11:05:05] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] don't export dumps web server filesystems to snapshots, they don't use it [puppet] - 10https://gerrit.wikimedia.org/r/400386 (owner: 10ArielGlenn)
[11:09:31] <wikibugs>	 (03PS1) 10ArielGlenn: allow dumps nfs server to be configured without clients if needed [puppet] - 10https://gerrit.wikimedia.org/r/400387
[11:09:39] <wikibugs>	 (03CR) 10EddieGP: "> Wouldn't it break things like global userpages having "hi, you can" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/397768 (https://phabricator.wikimedia.org/T182541) (owner: 10EddieGP)
[11:14:37] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4024 is OK: OK: expiry mailbox lag is 0
[11:16:12] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] allow dumps nfs server to be configured without clients if needed [puppet] - 10https://gerrit.wikimedia.org/r/400387 (owner: 10ArielGlenn)
[11:31:10] <wikibugs>	 (03PS1) 10ArielGlenn: create a profile for nginx-extras package for dumps [puppet] - 10https://gerrit.wikimedia.org/r/400391
[11:32:49] <wikibugs>	 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Migration of mw* servers to stretch - https://phabricator.wikimedia.org/T174431#3861568 (10elukey) mw2246 today reported a failure in logrotate:  ``` /etc/cron.daily/logrotate: Job for apache2.service failed because the control process exited with er...
[11:33:00] <elukey>	 Cc: volans --^ :)
[11:33:51] <wikibugs>	 10Operations, 10Mail, 10MediaWiki-Watchlist: Mails from MediaWiki seem to get (partially) lost - https://phabricator.wikimedia.org/T121105#3861569 (10hoo) >>! In T121105#3860761, @Aklapper wrote: > @hoo, @Lydia_Pintscher : Still an issue, 18 months later? Or should this task be closed?  I haven't had any (ap...
[11:35:36] <volans>	 elukey: ?
[11:36:28] <elukey>	 volans: cronspam from some videoscalers, I thought to ping you since the last time we had a chat about it
[11:36:42] <wikibugs>	 10Operations, 10Mail, 10MediaWiki-Watchlist: Mails from MediaWiki seem to get (partially) lost - https://phabricator.wikimedia.org/T121105#3861570 (10Lydia_Pintscher) 05Open>03Resolved a:03Lydia_Pintscher Yeah it seems ok now.
[11:37:03] <elukey>	 it was a FYI, didn't mean to page you :P
[11:37:13] <volans>	 ah got it, and those are stretch right?
[11:37:30] <volans>	 no "page", don't worry ;)
[11:38:28] <volans>	 good to know, let's if it's common to all of them or just a race condition
[11:38:42] <volans>	 *let's see
[11:40:07] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] create a profile for nginx-extras package for dumps [puppet] - 10https://gerrit.wikimedia.org/r/400391 (owner: 10ArielGlenn)
[11:40:29] <marostegui>	 elukey: if you need to page volans just write cumin
[11:40:32] <marostegui>	 :p
[11:42:38] <volans>	 marostegui: rotfl... that's a lie :-P
[11:43:20] <marostegui>	 volans: we both know you have a notification for any cumin word, you just refused to admit it
[11:44:00] <volans>	 who knows, you know irssi notifications are so hard to implement 
[11:44:25] <marostegui>	 send me you config and I can check it for you :-p
[11:44:30] <volans>	 ahahahah
[11:45:04] <volans>	 it's not safe there could be PII in it :-P
[11:45:25] <marostegui>	 ~/.irssi# wc -l config
[11:45:26] <marostegui>	 434 config
[11:47:00] <volans>	 669 here, but there might be some boilerplate autoadded
[11:47:14] <volans>	 I just need to remove 3 lines to be perfect
[11:47:23] <marostegui>	 hahaha
[11:47:59] <Zackary>	 inb4 removes nickserv password
[11:48:52] <marostegui>	 volans: cat config | grep hilights  for me?
[11:49:51] <volans>	 lol
[11:50:57] <marostegui>	 2018 will be the year I will find out whether you have a notification for it or not (you do)
[11:51:38] <volans>	 marostegui: hilights = (
[11:51:53] <volans>	 :-P
[11:53:38] <volans>	 ok in 2018 I'll tell you
[11:54:39] <wikibugs>	 (03PS1) 10ArielGlenn: move ipv6 setup for dump web servers to the appropriate profiles [puppet] - 10https://gerrit.wikimedia.org/r/400394
[11:59:02] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] move ipv6 setup for dump web servers to the appropriate profiles [puppet] - 10https://gerrit.wikimedia.org/r/400394 (owner: 10ArielGlenn)
[13:28:20] <wikibugs>	 (03PS1) 10ArielGlenn: get rid of redundant code in dumps web server manifests [puppet] - 10https://gerrit.wikimedia.org/r/400403
[13:36:25] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3861687 (10Qgil)
[13:36:33] <wikibugs>	 (03PS2) 10ArielGlenn: get rid of redundant code in dumps web server manifests [puppet] - 10https://gerrit.wikimedia.org/r/400403
[13:39:37] <wikibugs>	 10Operations, 10Developer-Relations: Bring discourse.mediawiki.org to production - https://phabricator.wikimedia.org/T180853#3861689 (10Qgil)
[14:02:20] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3861707 (10Qgil) > Guidelines for pre-SSO usernames i.e. "user your Wikimedia username"?  If I am reading [[ https://meta.discourse.org/t/is...
[14:04:49] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] get rid of redundant code in dumps web server manifests [puppet] - 10https://gerrit.wikimedia.org/r/400403 (owner: 10ArielGlenn)
[14:13:47] <wikibugs>	 (03PS1) 10ArielGlenn: get rid of redundant code in dumps nfs server manifests [puppet] - 10https://gerrit.wikimedia.org/r/400405
[14:24:31] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] get rid of redundant code in dumps nfs server manifests [puppet] - 10https://gerrit.wikimedia.org/r/400405 (owner: 10ArielGlenn)
[15:35:06] <icinga-wm>	 PROBLEM - Varnish HTTP text-backend - port 3128 on cp4027 is CRITICAL: connect to address 10.128.0.127 and port 3128: Connection refused
[15:36:06] <icinga-wm>	 RECOVERY - Varnish HTTP text-backend - port 3128 on cp4027 is OK: HTTP OK: HTTP/1.1 200 OK - 218 bytes in 0.157 second response time
[15:41:33] <wikibugs>	 (03PS1) 10Urbanecm: Add suppressredirect to autoreview/editor at ruwikt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400409 (https://phabricator.wikimedia.org/T183719)
[15:44:26] <icinga-wm>	 PROBLEM - Long running screen/tmux on analytics1003 is CRITICAL: CRIT: Long running SCREEN process. (PID: 5624, 1733528s 1728000s).
[15:49:27] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 5252: Connection refused
[15:53:20] <mobrovac>	 on it ^
[15:53:33] <apergos>	 thanks, I was just looking
[15:54:06] <apergos>	 it seems to have been restarted a few minutes ago (?) 
[15:54:30] <logmsgbot>	 !log mobrovac@tin Started restart [electron-render/deploy@94d27d7]: Bounce Electron, stuck - T174916
[15:54:54] <mobrovac>	 heh
[15:55:26] <icinga-wm>	 PROBLEM - https://phabricator.wikimedia.org on phab1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:55:36] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time
[15:55:42] <apergos>	 ah ha
[15:56:17] <icinga-wm>	 RECOVERY - https://phabricator.wikimedia.org on phab1001 is OK: HTTP OK: HTTP/1.1 200 OK - 34525 bytes in 0.238 second response time
[15:56:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:57] <stashbot>	 T174916: electron/pdfrender hangs - https://phabricator.wikimedia.org/T174916
[15:57:10] <_joe_>	 what did happen with phab?
[15:57:15] <_joe_>	 anyone has any ideas?
[15:57:32] <apergos>	 nope
[15:57:54] <apergos>	 i was in here because of pdfrender, but phab went and came back before I could even look 
[15:58:12] <_joe_>	 ok
[15:58:25] <_joe_>	 pdfrender was just one machine?
[15:58:34] <apergos>	 yep just the one this time
[15:58:47] <_joe_>	 ok
[15:58:50] <apergos>	 perhaps i should knock on wood or something...
[16:01:07] <akosiaris>	 phab transient error during holidays ? sigh
[16:01:12] <apergos>	 yeah
[16:01:40] <apergos>	 it ha already come back when I got the pages and realized that would interrupt everyone's relaxing vacation evening
[16:01:41] <apergos>	 meh
[16:01:43] <_joe_>	 akosiaris: seems so
[17:00:59] <twentyafterfour>	 !log restarting apache on phab1001
[17:01:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:08] <twentyafterfour>	 phab error is not entirely transient - there is a problem where something is eating up workers and keeping them marked as 'busy' until eventually it runs out of available workers - see 'apache connections' section on https://grafana.wikimedia.org/dashboard/db/phabricator?orgId=1
[17:02:26] <twentyafterfour>	 I think the problem has been unnoticed because I usually restart apache once a week for updates
[17:23:04] <apergos>	 oh? huh
[17:23:06] <apergos>	 thank you
[17:23:07] <icinga-wm>	 PROBLEM - Disk space on ms-be1033 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdk1 is not accessible: Input/output error
[17:27:27] <icinga-wm>	 PROBLEM - puppet last run on ms-be1033 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[mountpoint-/srv/swift-storage/sdk1]
[17:28:24] <apergos>	 that's legit, sdk shows erors in dmesg 
[17:37:05] <wikibugs>	 10Operations, 10ops-eqiad: failed disk on ms-be1033 - https://phabricator.wikimedia.org/T183723#3861840 (10ArielGlenn) p:05Triage>03Normal
[17:42:06] <icinga-wm>	 PROBLEM - HP RAID on ms-be1033 is CRITICAL: CRITICAL: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:2, 2I:2:3, 2I:2:4 - Failed: 2I:2:1 - Controller: OK - Battery/Capacitor: OK
[17:42:11] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on ms-be1033 is CRITICAL: CRITICAL: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:2, 2I:2:3, 2I:2:4 - Failed: 2I:2:1 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T183724
[17:42:14] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T183724#3861857 (10ops-monitoring-bot)
[17:44:13] <marostegui>	 apergos: I would merge both tasks probably
[17:45:54] <apergos>	 meh, mine can be deleted, I forgot about the autogenerated ones
[17:45:58] <wikibugs>	 10Operations, 10ops-eqiad: failed disk on ms-be1033 - https://phabricator.wikimedia.org/T183723#3861865 (10Marostegui)
[17:46:01] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T183724#3861867 (10Marostegui)
[17:46:15] <apergos>	 although I did go looking to see if there was already a task for some reason
[17:46:20] <marostegui>	 I always merge the wrong direction XD
[17:46:23] <marostegui>	 let me fix it
[17:46:26] <apergos>	 thanks
[17:46:37] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T183724#3861857 (10Marostegui) 05duplicate>03Open
[17:46:42] <apergos>	 I am trying to decide whether to fiddle with removing the device and rebalancing the rings and etc
[17:47:18] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T183724#3861857 (10Marostegui)
[17:47:20] <wikibugs>	 10Operations, 10ops-eqiad: failed disk on ms-be1033 - https://phabricator.wikimedia.org/T183723#3861840 (10Marostegui)
[17:47:35] <marostegui>	 done
[17:47:50] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T183724#3861857 (10Marostegui) p:05Triage>03Normal
[17:48:37] <apergos>	 it seems like no one has done that via the documented method for many months though so
[17:48:43] <apergos>	 not sure if that's the approved method now
[17:48:48] <apergos>	 any thoughts?
[17:49:24] <marostegui>	 Never done it so...I cannot help :)
[17:49:45] <apergos>	 well the last time I did it there was no puppet repo for rings, so I'm very out of date
[17:50:01] <marostegui>	 Oh wow
[17:50:20] <marostegui>	 If we shut it down it will just get out from the LB right?
[17:51:38] <apergos>	 well I think it's better to leave it up, if swift itself will just lower the weight of the device 
[17:51:42] <apergos>	 but I can't remember that either
[17:53:01] <marostegui>	 Yeah, I would assume it would do that by itself
[17:57:49] <apergos>	 well it says here (random web page on open stack) that it's good to unmount the device because that will help swift work around the replication failure
[17:57:55] <apergos>	 I suppose puppet would undo that
[17:58:16] <apergos>	 https://docs.openstack.org/swift/newton/admin_guide.html#handling-drive-failure    we run 2.10 which is this version
[18:01:17] <icinga-wm>	 RECOVERY - Disk space on ms-be1033 is OK: DISK OK
[18:02:03] <apergos>	 there's been no changes to the swift rings in that repo for the last reported bad disk I saw in phab, so going to leave it be for now
[18:04:00] <apergos>	 well it's unmounted automagically so that's that
[18:11:55] <apergos>	 ah, I see there is a swift drive audit script that must do it
[18:12:10] <apergos>	 the umount is blocked but it took the device out of the mount table at least
[19:42:07] <godog>	 apergos: thanks for taking a look! if the umount is blocked we can reboot
[19:42:23] <apergos>	 godog: I bet it is not
[19:42:40] <apergos>	 let me have a look
[19:43:24] <apergos>	 can't tell
[19:43:33] <apergos>	 sec
[19:43:53] <apergos>	 root     10977  0.0  0.0  23624  1224 ?        D    18:01   0:01 umount -fl /srv/swift-storage/sdk1
[19:43:56] <apergos>	 nope still stuck
[19:44:05] <apergos>	 oh. and I misread, heh
[19:44:22] <apergos>	 it is indeed still blocked, I thought you were saying if it was unblocked yet
[19:44:38] <apergos>	 do you want to do the honors?
[19:45:00] <godog>	 yeah I'll kick it
[19:45:03] <apergos>	 sweet
[19:45:25] <godog>	 I'm looking into why sdk wasn't commented from fstab though, swift-drive-audit should be able to
[19:45:35] <apergos>	 maybe that's after the umount?
[19:46:12] <godog>	 yeah likely, I would have hoped not
[19:46:14] <apergos>	 yep
[19:46:22] <apergos>	 the umount must finish, then it comments out :-P
[19:47:42] <godog>	 !log reboot ms-be1033 - T183724
[19:47:43] <apergos>	 welcome back irc-cloud users :-P 
[19:47:48] <godog>	 yeah I commented it manually
[19:47:51] <apergos>	 k
[19:47:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:47:55] <stashbot>	 T183724: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T183724
[19:48:11] <apergos>	 I take it we don't rebalance the swift rings for dead disks these days?
[19:49:12] <godog>	 no it is usually a matter of 2/3 days
[19:49:21] <apergos>	 and not worth it
[19:49:23] <apergos>	 gotcha
[19:49:46] <apergos>	 I wonder how that will play out this week
[19:51:19] <godog>	 true, yeah might be more like a week
[19:54:18] <godog>	 !log power reset ms-be1033
[19:54:28] <godog>	 meh, it wasn't coming back
[19:54:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:31] <apergos>	 ugh
[19:54:42] <apergos>	 oh it probably never finished powering down becuse waiting for that disk somehow
[19:56:36] <godog>	 quite possible yeah
[19:58:02] <godog>	 yeah it is back
[20:00:06] <apergos>	 ah so
[20:00:21] <apergos>	 nm, I already asked you
[20:00:25] <apergos>	 have a good vacation
[20:00:45] <godog>	 apergos: you too!
[20:02:16] <icinga-wm>	 RECOVERY - HP RAID on ms-be1033 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK
[20:02:27] <icinga-wm>	 RECOVERY - puppet last run on ms-be1033 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[20:47:20] <wikibugs>	 (03PS8) 10Krinkle: Move statsv varnishkafka and service to use main Kafka cluster(s) [puppet] - 10https://gerrit.wikimedia.org/r/391705 (https://phabricator.wikimedia.org/T179093) (owner: 10Ottomata)
[20:57:22] <wikibugs>	 (03CR) 10Krinkle: [C: 031] Add wikidata and mediawiki.org to $wgLocalVirtualHosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392999 (https://phabricator.wikimedia.org/T117302) (owner: 10TerraCodes)