[00:19:10] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+1/-0/±0] 13https://git.io/Jvlvq [00:19:12] [02miraheze/puppet] 07paladox 03913ad74 - Create init.pp [00:19:13] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [00:20:22] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JvlvY [00:20:23] [02miraheze/puppet] 07paladox 033d1a873 - Update init.pp [00:20:25] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [00:24:23] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+1/-0/±0] 13https://git.io/Jvlvn [00:24:24] [02miraheze/puppet] 07paladox 030b1f667 - Create interfaces.erb [00:24:26] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [00:24:52] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+1/-0/±0] 13https://git.io/JvlvW [00:24:54] [02miraheze/puppet] 07paladox 03416fec1 - Create 50-cloud-init.cfg [00:24:55] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [00:25:56] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/Jvlv4 [00:25:58] [02miraheze/puppet] 07paladox 0368b4f2f - Update init.pp [00:25:59] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [00:26:21] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+1/-0/±0] 13https://git.io/Jvlv0 [00:26:22] [02miraheze/puppet] 07paladox 03d889227 - Create 99-disable-network-config.cfg [00:26:24] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [00:28:54] PROBLEM - cp8 Current Load on cp8 is CRITICAL: CRITICAL - load average: 1.10, 2.06, 1.41 [00:30:52] RECOVERY - cp8 Current Load on cp8 is OK: OK - load average: 0.87, 1.57, 1.30 [00:42:52] [02miraheze/dns] 07JohnFLewis pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlvp [00:42:53] [02miraheze/dns] 07JohnFLewis 03411851f - push new ns2 entry [00:51:23] [02miraheze/puppet] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlft [00:51:25] [02miraheze/puppet] 07Southparkfan 03f592623 - Remove cp2 from puppet [00:51:43] [02miraheze/puppet] 07Southparkfan pushed 031 commit to 03master [+0/-1/±0] 13https://git.io/Jvlfq [00:51:44] [02miraheze/puppet] 07Southparkfan 038d92a50 - Delete cp2 hiera file [00:53:05] [02miraheze/dns] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlfm [00:53:07] [02miraheze/dns] 07Southparkfan 03b47aced - Remove cp2 from geoip pool [00:53:25] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb [00:53:26] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 2 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb [00:53:30] !log shutdown cp2 nginx for transferring nginx logs to cp6:/root [00:53:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [00:55:24] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [00:55:25] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [00:56:42] !log soft depooling of misc1 as ns2 via DNS entry change [00:56:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [01:11:29] [02miraheze/puppet] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlfB [01:11:30] [02miraheze/puppet] 07Southparkfan 03d27112d - Add temp ssh key [01:17:55] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlfz [01:17:56] [02miraheze/puppet] 07paladox 03a62568b - cp7: Increase cache size to 10g [01:18:39] [02miraheze/puppet] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlfa [01:18:41] [02miraheze/puppet] 07Southparkfan 03f68ca6b - Remove temp ssh key [01:21:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlfV [01:21:03] [02miraheze/puppet] 07paladox 03b608ef9 - cp6: Increase cache size [01:23:29] !log decomission cp2 [01:23:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [01:23:46] wrong server and a spelling mistake, cool [01:23:47] PROBLEM - cp2 Puppet on cp2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[ops_ensure_members] [01:26:06] W: molly-guard: SSH session detected! [01:26:06] Please type in hostname of the machine to shutdown: cp2.miraheze.org [01:26:06] Good thing I asked; I won't shutdown cp2 ... [01:26:32] didn't know yet programs can be so dumb [01:26:50] SPF|Cloud the hostname is cp2 [01:26:56] you could say cp2 is the hostname and cp2.miraheze.org is the fqdn, but still... [01:27:03] heh [01:27:25] !log drop phabricator_* on db4 [01:27:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [01:28:38] PROBLEM - Host cp2 is DOWN: CRITICAL - Time to live exceeded (107.191.126.23) [01:29:27] cp2 is no longer :D [01:30:26] :P [01:30:54] SPF|Cloud you can get rid of misc3 me thinks [01:31:33] 'thinks' [01:31:50] what about shutting it down and wait a few minutes to see if issues pop up? [01:31:51] SPF|Cloud well we've switched over to services1/2 and it seems to work [01:31:56] sure [01:33:06] !log shutdown misc3 - testing before permanent decom [01:33:11] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [01:34:46] paladox: do your testing [01:34:50] sure [01:35:40] works :P [01:35:55] (tested VE on meta (only clicking edit, and on test1 where i did an actual save) [01:36:11] goodbye misc3, won't miss you [01:36:21] lol [01:42:13] !log decommissioned misc3 [01:42:14] [02miraheze/puppet] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlfb [01:42:15] [02miraheze/puppet] 07Southparkfan 0357e0fb1 - Remove misc3 [01:42:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [01:49:37] Our first database server was called... db1.orain.org ;) [02:15:20] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_34 [+0/-0/±1] 13https://git.io/JvlUU [02:15:21] [02miraheze/mediawiki] 07paladox 03f0cbfa7 - Update Echo [02:25:27] !log increase puppet2 disk to 15G [02:25:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [02:29:40] PROBLEM - lizardfs6 Puppet on lizardfs6 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [02:32:07] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:32:14] RECOVERY - lizardfs6 Puppet on lizardfs6 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [02:32:56] !log apt-get upgrade - misc4 [02:33:09] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [02:33:23] PROBLEM - Host misc3 is DOWN: PING CRITICAL - Packet loss = 100% [02:33:33] PROBLEM - Host misc3 is DOWN: PING CRITICAL - Packet loss = 100% [02:34:06] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:34:08] PROBLEM - Host misc3 is DOWN: PING CRITICAL - Packet loss = 100% [02:35:06] !log stopped parsoid, removed /srv/parsoid from misc4 [02:35:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [02:36:18] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlUj [02:36:20] [02miraheze/dns] 07paladox 0352b4e76 - Remove misc3 from dns [02:36:34] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlTv [02:36:36] [02miraheze/dns] 07paladox 0377af874 - Remove cp2 from dns [02:38:47] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlTL [02:38:48] [02miraheze/puppet] 07paladox 037774ce0 - Fix [03:11:32] RECOVERY - Host misc3 is UP: PING OK - Packet loss = 0%, RTA = 0.53 ms [03:11:32] PROBLEM - misc3 Current Load on misc3 is CRITICAL: connect to address 185.52.1.71 port 5666: Connection refusedconnect to host 185.52.1.71 port 5666: Connection refused [03:11:32] PROBLEM - misc3 restbase on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:32] PROBLEM - misc3 electron on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:32] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [03:11:33] PROBLEM - misc3 SSH on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:33] PROBLEM - misc3 citoid on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:33] PROBLEM - misc3 Disk Space on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [03:11:34] PROBLEM - misc3 zotero on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:37] PROBLEM - misc3 Disk Space on misc3 is CRITICAL: connect to address 185.52.1.71 port 5666: Connection refusedconnect to host 185.52.1.71 port 5666: Connection refused [03:11:47] PROBLEM - misc3 zotero on misc3 is CRITICAL: connect to address 185.52.1.71 and port 1969: Connection refused [03:11:56] RECOVERY - misc3 SSH on misc3 is OK: SSH OK - OpenSSH_7.6p1 Ubuntu-4ubuntu0.3 (protocol 2.0) [03:12:32] PROBLEM - misc3 citoid on misc3 is CRITICAL: connect to address 185.52.1.71 and port 6927: Connection refused [03:20:39] PROBLEM - misc3 SSH on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:09:21] PROBLEM - cp8 Current Load on cp8 is CRITICAL: CRITICAL - load average: 1.51, 2.64, 1.69 [04:11:24] PROBLEM - cp8 Current Load on cp8 is WARNING: WARNING - load average: 0.69, 1.95, 1.55 [04:13:25] RECOVERY - cp8 Current Load on cp8 is OK: OK - load average: 0.59, 1.48, 1.43 [05:04:03] PROBLEM - cp8 Current Load on cp8 is CRITICAL: CRITICAL - load average: 1.14, 2.04, 1.36 [05:06:06] PROBLEM - cp8 Current Load on cp8 is WARNING: WARNING - load average: 1.37, 1.76, 1.34 [05:08:08] RECOVERY - cp8 Current Load on cp8 is OK: OK - load average: 0.30, 1.23, 1.19 [06:27:37] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 3139 MB (13% inode=94%); [08:46:31] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [10:59:36] !log umount /mnt/mediawiki-static ; mount /mnt/mediawiki-static on mw3 [10:59:40] ^ paladox when I tried to mount it said [10:59:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [10:59:52] /sbin/mount.glusterfs: according to mtab, GlusterFS is already mounted on /mnt/mediawiki-static [11:03:34] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:52] Thanks, that’s fine :£ [11:03:53] *:) [11:20:53] RECOVERY - Host misc3 is UP: PING OK - Packet loss = 0%, RTA = 3.27 ms [11:21:26] RECOVERY - misc3 SSH on misc3 is OK: SSH OK - OpenSSH_7.9p1 Debian-10 (protocol 2.0) [11:33:27] [02mw-config] 07bonnedav opened pull request 03#2880: Prevent crat from editing interwiki admin group - 13https://git.io/JvlG3 [12:01:39] Hello benjaminikuta! If you have any questions, feel free to ask and someone should answer soon. [12:33:35] [02mw-config] 07paladox closed pull request 03#2880: Prevent crat from editing interwiki admin group - 13https://git.io/JvlG3 [12:33:37] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlZR [12:33:38] [02miraheze/mw-config] 07bonnedav 03ba99f3a - Prevent crat from editing interwiki admin group (#2880) Current a wiki bureaucrat can use manage wiki to modify the local interwiki admin group because it is not blacklisted. Due the frankly ridiculous requirements currently in place for getting the interwiki permission I assume this is not desired. Now I don't see why this permission is restricted in the first [12:33:38] place when editinterface already allows adding malicious content if desired, but here is the fix anyway because even though I don't agree with the rules I am never one to abuse exploits or break rules and I gueuely care for miraheze and want to see it continue to thrive for years to come. Note while I did add myself to the group on my wiki I only did it as a test, I did not change the interwiki table at all and I [12:33:38] removed the group as soon as I confirmed it could be done, the logs should be able to confirm this if anyone is suspicious for any reason. Thank you. [12:46:36] [02landing] 07alex4401 opened pull request 03#28: Add translations to Polish - 13https://git.io/JvlZ5 [12:48:46] [02landing] 07alex4401 synchronize pull request 03#28: Add translations to Polish - 13https://git.io/JvlZ5 [14:30:40] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlWd [14:30:42] [02miraheze/puppet] 07paladox 03b6cf5a8 - mediawiki: Increase opcache [14:31:06] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlWF [14:31:08] [02miraheze/puppet] 07paladox 03615a9bc - phabricator: Increase opcache [14:42:13] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2649 MB (10% inode=94%); [14:45:42] !log tried upgrading phabricator to latest on master, failed and rollbacked [14:45:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [15:43:05] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 2 datacenters are down: 2a00:d880:5:8ea::ebc7/cpweb, 51.161.32.127/cpweb [15:43:36] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 128.199.139.216/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [15:45:03] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [15:45:35] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [15:49:03] paladox: can I recommend changing the topic at the start of tonight’s downtime to a outage message? I can come up with the text [15:49:47] Sure, Status: Maintenance in progress [15:50:41] paladox: I’d move it to the start to avoid people not seeing it. [15:51:40] Something like ‘miraheze.org is currently unavailable due to maintenance. This is expected to finish at 2AM UTC. If you have any questions, please ask and someone will answer when available. [15:52:31] That won't fit in there [15:52:31] That would def hit the topic length limit [15:52:46] I think maint in progress is sufficient [15:52:53] and we only need to say maintenance, we've annouced it as a site notice, and also on discord too. [15:53:54] True [15:54:17] I’ll be on hand for a lot of it to help anyway if people do ask [15:54:31] although I did mean replace the topic with that [15:56:05] We dont replace the topic when we do other status changes why now [15:57:07] Because it’s a big 7 hour outage [15:57:20] Not a simple we’ll be back ASAP [16:03:53] Hi Voidwalker [16:04:00] hey [16:04:23] Voidwalker: how you doing? [16:04:36] pretty good, but also tired [16:05:04] I am a bit, I’m not sure I’ll still be here at 2am [16:15:08] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlBq [16:15:10] [02miraheze/services] 07MirahezeSSLBot 035ec1043 - BOT: Updating services config for wikis [16:21:31] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 128.199.139.216/cpweb [16:23:29] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [16:33:35] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [16:33:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JvlBj [16:33:38] [02miraheze/puppet] 07paladox 03a2ba9cc - Update init.pp [16:35:19] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+1/-0/±1] 13https://git.io/JvlRf [16:35:21] [02miraheze/puppet] 07paladox 03036d414 - Update init.pp [16:35:22] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [16:44:53] .github Miraheze/puppet [16:44:53] https://github.com/Miraheze/puppet [16:47:24] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [16:47:25] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JvlR0 [16:47:27] [02miraheze/puppet] 07paladox 03a5ff731 - Update cloud.pp [16:51:35] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [16:51:36] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JvlRw [16:51:37] [02miraheze/puppet] 07paladox 03861f0b1 - Update cloud.pp [17:00:00] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [17:00:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JvlRH [17:00:03] [02miraheze/puppet] 07paladox 031dee36f - Update init.pp [17:05:32] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/Jvl0U [17:05:34] [02miraheze/puppet] 07paladox 034b9cc60 - Update cloud1.yaml [17:05:35] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [17:06:41] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/Jvl0I [17:06:43] [02miraheze/puppet] 07paladox 0360fd3fb - Update cloud2.yaml [17:06:45] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [17:07:13] [02puppet] 07paladox synchronize pull request 03#1226: Introduce cloud role - 13https://git.io/JvWWx [17:07:14] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/Jvl0L [17:07:16] [02miraheze/puppet] 07paladox 03f7167e9 - Update init.pp [17:35:04] .t utc [17:35:05] 2020-02-14 - 17:35:04UTC [17:51:10] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-5 [+0/-0/±1] 13https://git.io/JvlEn [17:51:12] [02miraheze/puppet] 07paladox 033126ea2 - varnish: Switch from mw[123] & lizardfs6 to mw[4567] [17:51:13] [02puppet] 07paladox created branch 03paladox-patch-5 - 13https://git.io/vbiAS [17:51:15] [02puppet] 07paladox opened pull request 03#1228: varnish: Switch from mw[123] & lizardfs6 to mw[4567] - 13https://git.io/JvlEc [17:51:56] paladox: is that going live mid maint window? [17:53:06] I have no idea, i'm just preparing. [17:54:15] Kk [17:54:36] test1wiki will probably want renaming to test2wiki at some point [17:55:10] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-5 [+0/-0/±1] 13https://git.io/JvlEW [17:55:11] [02miraheze/puppet] 07paladox 03ac5895f - Update default.vcl [17:55:13] [02puppet] 07paladox synchronize pull request 03#1228: varnish: Switch from mw[123] & lizardfs6 to mw[4567] - 13https://git.io/JvlEc [17:56:46] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-5 [+0/-0/±1] 13https://git.io/JvlEB [17:56:47] [02miraheze/puppet] 07paladox 03e8a4364 - Update init.pp [17:56:49] [02puppet] 07paladox synchronize pull request 03#1228: varnish: Switch from mw[123] & lizardfs6 to mw[4567] - 13https://git.io/JvlEc [17:56:58] * hispano76 greetings :) [17:57:05] Hey hispano76 [17:57:13] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-5 [+0/-0/±1] 13https://git.io/JvlE0 [17:57:14] [02miraheze/puppet] 07paladox 036b910f8 - Update varnish.pp [17:57:16] [02puppet] 07paladox synchronize pull request 03#1228: varnish: Switch from mw[123] & lizardfs6 to mw[4567] - 13https://git.io/JvlEc [17:58:51] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-6 [+0/-0/±1] 13https://git.io/JvlEz [17:58:53] [02miraheze/puppet] 07paladox 03ba6d719 - mediawiki: Switch mw1 to mw4 for letsencrypt and services [17:58:54] [02puppet] 07paladox created branch 03paladox-patch-6 - 13https://git.io/vbiAS [17:58:56] [02puppet] 07paladox opened pull request 03#1229: mediawiki: Switch mw1 to mw4 for letsencrypt and services - 13https://git.io/JvlEg [17:59:26] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-6 [+0/-0/±1] 13https://git.io/JvlEa [17:59:27] [02miraheze/puppet] 07paladox 03cf855a0 - Update mw1.yaml [17:59:29] [02puppet] 07paladox synchronize pull request 03#1229: mediawiki: Switch mw1 to mw4 for letsencrypt and services - 13https://git.io/JvlEg [18:00:27] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlEK [18:00:29] [02miraheze/puppet] 07paladox 03075474a - monitoring: Remove misc[34] and cp2 from groups.conf [18:00:57] PROBLEM - bacula1 Bacula Static on bacula1 is WARNING: WARNING: Full, 2336892 files, 199.8GB, 2020-01-30 17:58:00 (2.1 weeks ago) [18:01:52] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlEP [18:01:54] [02miraheze/puppet] 07paladox 03c419208 - Update groups.conf [18:03:14] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-7 [+0/-0/±1] 13https://git.io/JvlEM [18:03:16] [02miraheze/puppet] 07paladox 035b7bc9c - gluster: Remove mw[123] and lizardfs6 from firewall [18:03:17] [02puppet] 07paladox created branch 03paladox-patch-7 - 13https://git.io/vbiAS [18:03:19] [02puppet] 07paladox opened pull request 03#1230: gluster: Remove mw[123] and lizardfs6 from firewall - 13https://git.io/JvlED [18:04:05] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-6 [+0/-0/±1] 13https://git.io/JvlE7 [18:04:07] [02miraheze/puppet] 07paladox 033dd9b3b - Update phab.miraheze.wiki.conf [18:04:08] [02puppet] 07paladox synchronize pull request 03#1229: mediawiki: Switch mw1 to mw4 for letsencrypt and services - 13https://git.io/JvlEg [18:11:52] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 51.161.32.127/cpweb [18:12:26] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/JvluI [18:12:28] [02miraheze/mw-config] 07paladox 0311b0215 - database: Switch to db6 [18:12:29] [02mw-config] 07paladox created branch 03paladox-patch-3 - 13https://git.io/vbvb3 [18:12:31] [02mw-config] 07paladox opened pull request 03#2881: database: Switch to db6 - 13https://git.io/JvluL [18:13:51] miraheze/mw-config/paladox-patch-3/11b0215 - paladox The build passed. https://travis-ci.org/miraheze/mw-config/builds/650556367 [18:13:51] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [18:19:25] [02puppet] 07JohnFLewis commented on pull request 03#1229: mediawiki: Switch mw1 to mw4 for letsencrypt and services - 13https://git.io/JvluZ [18:20:08] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlun [18:20:10] [02miraheze/services] 07MirahezeSSLBot 0326ac1b2 - BOT: Updating services config for wikis [18:24:43] [02puppet] 07paladox synchronize pull request 03#1229: mediawiki: Switch mw1 to mw4 for letsencrypt and services - 13https://git.io/JvlEg [18:24:45] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-6 [+0/-0/±1] 13https://git.io/Jvlul [18:24:46] [02miraheze/puppet] 07paladox 03b682d11 - Update mw4.yaml [18:25:00] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-6 [+0/-0/±1] 13https://git.io/Jvlu8 [18:25:01] [02miraheze/puppet] 07paladox 0323dd80d - Update jobrunner1.yaml [18:25:03] [02puppet] 07paladox synchronize pull request 03#1229: mediawiki: Switch mw1 to mw4 for letsencrypt and services - 13https://git.io/JvlEg [18:25:13] [02puppet] 07paladox edited pull request 03#1229: mediawiki: Switch mw1 to jobrunner1 for letsencrypt and services - 13https://git.io/JvlEg [18:25:22] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki config] [18:25:41] [02puppet] 07paladox synchronize pull request 03#1229: mediawiki: Switch mw1 to jobrunner1 for letsencrypt and services - 13https://git.io/JvlEg [18:25:42] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-6 [+0/-0/±1] 13https://git.io/JvluE [18:25:44] [02miraheze/puppet] 07paladox 038b9890d - Update phab.miraheze.wiki.conf [18:29:26] PROBLEM - cp8 Current Load on cp8 is WARNING: WARNING - load average: 1.08, 1.71, 1.15 [18:31:02] FYI: Maintenance will start in 30 minutes. Please save your edits before 19:00 UTC. [18:31:24] RECOVERY - cp8 Current Load on cp8 is OK: OK - load average: 1.36, 1.54, 1.15 [18:47:15] FYI: Maintenance will start in 12 minutes. Please save your edits before 19:00 UTC. [18:48:36] Almost there... [18:50:29] SPF|Cloud: good luck! I’ll be around as much as i can until 00:00 in case users have questions but no guarntee after that. I think void is around from 20:00/21:00 onwards for the rest [18:50:41] Much appreciated :) [18:50:51] No problem [18:51:33] good :) [18:52:38] !log disable puppet on db[456] [18:52:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:55:07] PROBLEM - db4 Puppet on db4 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 3 minutes ago with 0 failures [18:56:01] PROBLEM - db5 Puppet on db5 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 4 minutes ago with 0 failures [18:58:51] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlzT [18:58:53] [02miraheze/mw-config] 07Reception123 03fb418d3 - disable createwiki for db migration [19:01:37] Reception123: it won't go through anyway (nor will a request) with $wgReadOnly [19:01:49] well I've been told to do it so :P [19:01:57] fair enough [19:02:13] better safe then sorry :) [19:02:18] true [19:03:10] Reception123: shouldn't $wgReadOnly be set across the cluster? [19:03:28] RhinosF1: not yet [19:03:28] RhinosF1: Paladox is handling the migration [19:03:54] aiming to do this one without constant read-only mode [19:04:48] SPF|Cloud: but has the migration started yet, or are we still waiting on paladox ? [19:05:09] SPF|Cloud: cool [19:06:58] RO for config changes [19:07:23] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlz4 [19:07:25] [02miraheze/mw-config] 07Southparkfan 03f68f30c - Put all wikis in read-only mode [19:10:08] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlzz [19:10:10] [02miraheze/services] 07MirahezeSSLBot 03e6d3842 - BOT: Updating services config for wikis [19:10:37] Reception123: paladox isn't do db migration [19:10:55] JohnLewis: oh, ok [19:10:58] that's what I understood [19:11:03] JohnLewis: then I assume SPF|Cloud is? [19:11:12] Yeah [19:11:32] !log set gluster file system to read only on lizardfs6 [19:12:47] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlzr [19:12:48] [02miraheze/mw-config] 07Southparkfan 03d8db61f - Disable read-only mode for now [19:18:48] paladox: JohnLewis Reception123 SPF|Cloud im around to help for a little if needed [19:19:01] ok, thanks [19:19:21] Zppix: full time? [19:19:35] RhinosF1: no i leave for work in 2 hours [19:20:51] Zppix: kk, void should be back then. Just making sure someone's here to help users if i'm not [19:26:05] !log dumping db4's databases to db6 in /home/dbcopy [19:26:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:26:25] !log dynamically set server_id=4 on db4 and server_id=6 on db6 [19:26:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:26:48] !log added firewall rule on db4 to allow traffic from db6 (v4/v6) to port 3306 [19:27:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:34:27] !log [19:11:31] <+paladox> !log set gluster file system to read only on lizardfs6 [19:34:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:49:20] paladox: db6 contains centralauth/metawiki databases, can they be dropped? [19:49:32] SPF|Cloud yes, i thought i did that? [19:49:48] apparently not :) [19:50:00] SPF|Cloud gone now [19:55:59] PROBLEM - bacula1 Bacula Databases db4 on bacula1 is WARNING: WARNING: Diff, 75994 files, 41.57GB, 2020-01-30 19:53:00 (2.1 weeks ago) [19:57:26] * RhinosF1 is going for half an hour in a few mins. Please ping if you need anything. [20:04:13] !log install atop/iotop/iperf (with fw rule for latter) on db4/db6 [20:04:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [20:22:58] * RhinosF1 back-ish but mobile now until 22/23:00 [20:23:45] good [20:28:39] !log backing up phabricator db [20:28:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [20:29:10] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl2z [20:29:12] [02miraheze/mw-config] 07Southparkfan 03dbbb5ee - All wikis read-only [20:30:30] !log install mariadb-client-10.3 on phab1 [20:30:56] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/Jvl2o [20:30:58] [02miraheze/puppet] 07paladox 032f8eec5 - db6: Downgrade to mariadb 10.2 [20:30:59] [02puppet] 07paladox created branch 03paladox-patch-8 - 13https://git.io/vbiAS [20:31:01] [02puppet] 07paladox opened pull request 03#1231: db6: Downgrade to mariadb 10.2 - 13https://git.io/Jvl2K [20:34:28] !log stop mariadb on db6 [20:35:11] !log rm -rf /srv/mariadb - db6 [20:36:03] !log set read_only=1 on db4 [20:42:55] !log set read_only=1 on db5 [20:45:58] Attention to all users of Miraheze just a reminder we are currently moving over to our new infrastructure, SRE and myself are all working together to make this go as smooth and quick as possible [20:47:18] Zppix: don’t worry, everyone’s doing great so far! Keep going and hopefully my current luck will be shared with everyone! [20:47:53] you don't want to put that in Discord? [20:48:05] ATTENTION: Due to unforeseen circumstances, it is not possible to perform this migration while keeping the wikis online, since that can cause database loss. Since it is very important to avoid data loss and errors, we are using another strategy for this upgrade. However, as a result of this, all wikis may not be readable until 22:30 UTC. [20:48:14] hispano76: {{done}} [20:48:27] Oh, you're way ahead of me. XD [20:48:36] The original maintenance window (until 02:00 UTC) is still in effect and we urge you to keep local copies of your edits. [20:49:18] .t utc [20:49:19] 2020-02-14 - 20:49:19UTC [20:49:49] paladox: good luck [20:53:04] !log shutdown db4 mariadb [20:55:10] PROBLEM - db4 MySQL on db4 is CRITICAL: Can't connect to MySQL server on '81.4.109.166' (115) [20:55:32] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4208 bytes in 0.022 second response time [20:55:39] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [20:55:43] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [20:55:56] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4210 bytes in 0.014 second response time [20:55:57] ^ all expected [20:56:16] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4212 bytes in 0.116 second response time [20:56:20] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [20:56:21] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [20:56:29] Guessed, thanks Zppix [20:56:42] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4212 bytes in 0.021 second response time [20:56:42] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4212 bytes in 0.022 second response time [20:56:44] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 74% [20:56:45] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 86% [20:56:46] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [20:56:53] icinga-miraheze: its okay i promise [20:57:15] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 86% [20:58:14] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-9 [+0/-0/±1] 13https://git.io/JvlaO [20:58:15] [02miraheze/puppet] 07paladox 03dfa7462 - Redirect all wikis to miraheze.org [20:58:17] [02puppet] 07paladox created branch 03paladox-patch-9 - 13https://git.io/vbiAS [20:58:18] [02puppet] 07paladox opened pull request 03#1232: Redirect all wikis to miraheze.org - 13https://git.io/Jvla3 [21:03:57] PROBLEM - bacula1 Bacula Databases db5 on bacula1 is WARNING: WARNING: Diff, 478 files, 63.37GB, 2020-01-30 21:00:00 (2.1 weeks ago) [21:14:14] Alright im off to work [21:14:29] good luck SPF|Cloud paladox try not to rm -rf / too many things [21:14:36] thanks and lol [21:34:46] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 58% [21:40:48] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 74% [21:57:17] paladox: how things going? [21:57:32] We're still syncing the db's. [21:57:43] Cool [21:58:24] paladox: i can probably be back on laptop within the hour. Not sure how long i’ll be awake though. [21:58:31] ok [21:58:41] Voidwalker: around if i do fall asleep for support? [21:58:44] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 52% [22:00:44] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 79% [22:01:46] as long as my connection doesn't die again, yeah :P [22:01:58] Voidwalker: cool [22:02:58] Voidwalker: my laptop is directly connected to the router with a long cable... [22:04:02] while losing wi-fi shouldn't stop the actual migration (running inside 'screen', which means that it won't be killed if the ssh session is closed), it would be very inconvenient during such a huge migration [22:05:18] Yeah lets hope it stays up [22:07:24] yeah, I'm connected via ethernet, but it's still bad [22:07:42] or, I should be, but it looks like I'm not anymore [22:08:39] Oh [22:08:46] so, we're moving 330G [22:09:07] SPF|Cloud: for all 400 wikis [22:09:08] ?? [22:09:16] 400 wikis?! [22:09:17] 75G done in 24 minutes [22:09:50] paladox: 4000 [22:10:04] ok [22:10:08] I can’t type apparently [22:10:24] SPF|Cloud: that’s pretty fast [22:10:30] whereas with out initial plan it was 6GB per hour [22:10:37] with our* [22:10:46] That’s excellent then! [22:11:28] RhinosF1: I must admit my experience shuffling clusters and databases around dates back to times where we didn't have as many wikis [22:12:37] and where downtime for moving wikis was... acceptable regardless. Unfortunately, if we would do the move with the initial plan, it would have taken 42 hours before all wikis were moved, and for the rest of the plan to succeed all MariaDB binary logs would have to stay at db4, which might not have been possible due to the lack of disk space. [22:13:13]