[08:47:10] !log admin cleanup the cloud-announce pending emails (spam) [08:47:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:23:14] !log admin refreshed nova DB grants in clouddb2001-dev for the codfw1dev deployment [09:23:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:27:16] !log admin cleanup `nova service-list` from old hypervisors (labtest*) [09:27:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:27:36] !log admin last log entry refers to the codfw1dev deployment [09:27:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:31:04] !log admin (codfw1dev) deleting a bunch of VMs in ERROR and SHUTDOWN state [09:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:32:42] !log admin (codfw1dev) deleting a bunch of VMs that were running in now missing hypervisors [09:32:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:59:44] !log admin (codfw1dev) drop missing glance images (T228972) [09:59:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:59:48] T228972: CloudVPS: codfw1dev: refresh glance images - https://phabricator.wikimedia.org/T228972 [10:28:51] !log openstack add moritz to the project to help with tests related to T228870 [10:28:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [10:34:16] !log openstack created moritz-mds-stretch-test VM for testing T228870 [10:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [10:47:59] !log openstack re-enabled puppet agent in cloud-bootstrapvz-stretch (disabled by andrew 88652 minutes ago) to handle T228983 [10:48:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [10:48:02] T228983: glance: refresh stretch image to include latest security fixes - https://phabricator.wikimedia.org/T228983 [12:26:17] !log openstack delete moritz-mds-stretch-test[-2] VMs, testing is over [12:26:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [12:32:34] !log admin eqiad1/glance: debian-9.9-stretch image deprecates debian-9.8-stretch (T228983) [12:32:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:32:39] T228983: glance: refresh stretch image to include latest security fixes - https://phabricator.wikimedia.org/T228983 [14:06:58] !log openstack create 4 testing VMs on cloudvirt1015 T220853 [14:07:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [14:07:02] T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory - https://phabricator.wikimedia.org/T220853 [14:49:50] !log openstack running cpu and ram stress tests on cloudvirt1015 T220853 [14:49:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [14:49:53] T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory - https://phabricator.wikimedia.org/T220853 [15:09:55] Starting the planned NFS maintenance [16:11:40] !log tools.shinken restarting shinken after NFS maintenance T224228 [16:11:41] jeh: Unknown project "tools.shinken" [16:11:54] * jeh sighs [16:12:20] !log shinken restarting shinken after NFS maintenance T224228 [16:12:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [19:36:28] error: failed receiving gdi request response for mid=1 (got syncron message receive timeout error). [19:36:28] xml.etree.ElementTree.ParseError: no element found: line 1, column 0 [19:36:28] from File "/usr/bin/job" [19:36:34] Got a cron mail just now [19:37:26] Something similar (possibly the same) was reported at https://phabricator.wikimedia.org/T225373 [20:19:25] Krinkle: every time I've tried to track that down its been a transient hiccup :/ [20:37:35] milimetric, nuria Is there a good time we can schedule a reboot on labsdb1012.eqiad.wmnet for a system update? [20:38:12] jeh: as long as it’s not in the first few days of the month it’s fine [20:38:29] jeh: is that the very exclusive lab db replica? [20:38:31] so not Aug 1st through Aug 5th [20:39:03] And if you’re thinking like Aug 6th or close to that frame, we’d appreciate a ping before you go ahead [20:39:32] Usually we should be ok after like Aug 3rd but I’m building a little buffer in case we have to reimport [20:39:55] milimetric: how about before August? [20:39:56] milimetric: great thanks, I'd like to push it out today or tomorrow [20:40:10] nuria: I'm not sure, I think so [20:40:13] bd808: totally fine before Aug [20:40:22] nuria: yes [20:41:13] jeh: It runs a mysql instance of labsdb data right? [20:43:58] yes, it is the clone of labsdb10{09,10,11} that are the Wiki Replica servers used by the Cloud Services community [20:44:51] its the one y'all bought so that you could stop crushing the shared service to rebuild your data lake [20:45:50] bd808: ok, then anytime before july 31st is fine [21:22:13] !log tools rebooting tools-worker-1016 unresponsive [21:22:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:42:56] ehmm. [21:43:05] was there some major maintenance that i missed or something? [21:43:25] There was a NFS maintenance announcement today [21:43:38] and a couple of days ago they did some major stuff iirc [21:44:10] tiles rendering is down since about 6-9 hourse it seems.. [21:44:25] Could not chdir to home directory /home/hartman: Stale file handle [21:44:26] -bash: /home/hartman/.bash_profile: Stale file handle [21:44:30] oh fun ;) [21:47:23] thedj: looks like that's related to the NFS maintenance today. Which host are you getting the stale file handle on? [21:47:37] maps-tiles1 [21:48:16] yeah mounts didn't come back up it seems... [21:50:02] * thedj will just hit the restart button [21:51:36] thedj: that'd probably best if it's an option. [21:56:29] I think I have a problem with jsub [21:57:01] !log maps restarted tiles1 because home dir and project/maps mounts were gone [21:57:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [21:57:35] oh great.. no dice [21:57:37] well it's not jsub but the script failing in the middle of the operation for some reason [21:59:59] hmm, the nfs maintenance was a security thing ? [22:00:08] cause the ticket seems private [22:00:15] meh. not time right now [22:00:37] thedj: yeah, it was related to updates on the NFS servers [22:00:57] !log maps tiles1 (probably the rest too) down due to todays NFS maintenance. will have to check another time - DJ [22:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [22:01:13] nite ppl [22:01:35] !log tools T228573 created tools-worker-1030 [22:01:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:01:38] T228573: toolforge k8s nodes oom? - https://phabricator.wikimedia.org/T228573 [22:04:09] !log tools.mabot Update mabot to 4da7b9a. [22:04:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.mabot/SAL [22:10:50] thedj: I see the problem, there was a cleanup step missed during the maintenance. It looks better now [22:10:58] I'll leave a message on their SAL [22:11:33] !log maps cleaned up an issue following NFS maintenance -- maps NFS mounts coming back online [22:11:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [22:14:55] !log maps remounted NFS /mnt/nfs/secondary-maps on maps-tiles1 [22:14:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL