[10:06:00] !log mailman created & delegated DNS zone `lists.wmcloud.org` (T278358) [10:06:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Mailman/SAL [10:06:04] T278358: Delegate lists.wmcloud.org domain to be able to add DNS DKIM records - https://phabricator.wikimedia.org/T278358 [10:08:49] !log admin upgrading kernel on cloudcephmon2003-dev and reboot (T274565) [10:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:08:53] T274565: [ceph] Test and upgrade to kernel ~15 - https://phabricator.wikimedia.org/T274565 [10:18:23] !log admin upgrading kernel on cloudcephosd2002-dev and reboot (T274565) [10:18:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:18:28] T274565: [ceph] Test and upgrade to kernel ~15 - https://phabricator.wikimedia.org/T274565 [10:24:10] !log admin upgrading kernel on cloudcephosd2003-dev and reboot (T274565) [10:24:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:24:14] T274565: [ceph] Test and upgrade to kernel ~15 - https://phabricator.wikimedia.org/T274565 [10:31:08] !log admin kernel upgrade on osds on codfw done, running performance tests (T274565) [10:31:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:31:13] T274565: [ceph] Test and upgrade to kernel ~15 - https://phabricator.wikimedia.org/T274565 [12:51:05] !log tools create cinder volume `tools-aptly-data` (T278354) [12:51:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:51:11] T278354: Toolforge: migrate services node to Debian Buster - https://phabricator.wikimedia.org/T278354 [12:58:38] !log tools created VM `tools-services-05` as Debian Buster (T278354) [12:58:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:58:43] T278354: Toolforge: migrate services node to Debian Buster - https://phabricator.wikimedia.org/T278354 [13:31:08] !log tools point aptly clients to `tools-services-05.tools.eqiad1.wikimedia.cloud` (hiera change) (T278354) [13:31:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:31:14] T278354: Toolforge: migrate services node to Debian Buster - https://phabricator.wikimedia.org/T278354 [13:33:39] !log tools shutdown tools-sge-services-04 (T278354) [13:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:36:37] !log tools shutdown tools-sge-services-03 (T278354) [13:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:36:42] T278354: Toolforge: migrate services node to Debian Buster - https://phabricator.wikimedia.org/T278354 [14:31:04] Hi everyone, here I am again. But now with trouble to set a puppet master following the tutorial https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster [14:32:25] when trying to do this for the instance maps-puppetmaster01.maps-experiments.eqiad1.wikimedia.cloud apache2 wasn't starting [14:35:35] are there any reason for that? I'm going to destroy and start a new instance in a few minutes and retry the tutorial. But I thought you would like to know what the error was when running the puppet agent twice in the fresh puppetmaster machine [14:36:30] https://www.irccloud.com/pastebin/eoSMBn4R/ [14:38:15] thesocialdev: what's in the apache2 syslog? that's just saying "it failed" but the syslog likely contains why [14:38:42] https://www.irccloud.com/pastebin/yeSmUaiO/ [14:39:43] uhh, what about /var/log/apache2/error.log? [14:41:40] * Majavah needs to leave for a moment, hopefully others are around to help [14:41:51] sorry, I'm restarting the process and I have deleted the instance. If it happens again I'll post more information. [15:16:25] !log tools.refill-api deleting pods to restart service T278211 [15:16:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.refill-api/SAL [15:16:30] T278211: Refill tool stuck "waiting for an available worker" - https://phabricator.wikimedia.org/T278211 [15:20:53] thesocialdev: The puppetization for standalone may have been affected by upstream changes. The error you were seeing sounds like https://stackoverflow.com/questions/48525092/docker-httpd-configuration-error-no-mpm-loaded The fact that there's an error at all suggests that maybe the puppet role needs updates. All else fails, you might be able to update the config by hand to fix it at first [15:23:14] thanks bstorm I wasn't sure about what was going on and tried to recreate a new puppetmaster, right now I'm facing an odd behaviour for the new instance, it's scheduled to build for the past 40 minutes. Once it's up I'll try again the process and if the problem happen again I'll fix it locally. [15:24:06] If it isn't building, that sounds like there's some error or something...are you over quota? [15:27:54] bstorm: quota is fine. I assumed because I created and deleted an instance quickly I was facing some rate limiting [15:28:16] That shouldn't trigger rate limiting [15:28:32] which project? [15:28:45] oh I see maps-experiments [15:30:21] thesocialdev: there's something broken there that we'll fix. I'm running to a meeting atm [15:30:42] oh okay, thanks bstorm [15:31:32] thesocialdev: I'll take a look, I've been working on new builds over the last few days [15:33:41] thanks @andrewbogott [15:35:14] thesocialdev: I think you should just kill it and try again. I'm curious but you don't need to wait for me to satisfy my curiosity unless it happens repeatedly. [15:36:42] * thesocialdev recreating instance [15:37:05] yeah @andrewbogott no issue now, instance created [15:37:13] :/ [15:37:58] thesocialdev: I think maybe you had a name collision, did you delete a VM with the same name before creating the stuck one? [15:38:49] can the name collision happen even if the build doesn't happen? [15:39:21] I have changed the name when trying to create the instance that failed to build [15:39:24] probably — but I'm talking about the stuck build from an hour ago not the one you did just now (which seems fine) [15:39:51] There are some race conditions; it's best to wait a few minutes after deleting a VM before recreating with the same name. [15:39:57] I'm not 100% sure that's what happened [15:40:04] But anyway, it sounds like you're un-stuck now [15:40:16] That would be weird because I changed the name, I don't think the collision could happen in the first time [15:40:32] Yes, unblocked until proved wrong. Thanks! [16:05:26] !log tools failed over the tools grid to the shadow master T277653 [16:05:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:05:32] T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653 [16:18:18] !log tools icinga-downtime toolschecker for 2h [16:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:20:20] !log tools rebuilding tools-sgegrid-master VM as debian buster (T277653) [16:20:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:20:24] T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653 [16:30:05] !log tools.openstack-browser Added Majavah as co-maintainer [16:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.openstack-browser/SAL [16:38:10] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 2.8.1.0 [libirc v. 1.0.3] my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [16:38:10] @help [16:38:13] @op [16:55:06] and here I am again on for maps-experiments. I can't get the certificates to validate following https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster#Step_2:_Setup_a_puppet_client [16:55:30] after doing every step without errors, I run puppet agent one last time and get `certificate verify failed (certificate revoked)` [16:58:49] thesocialdev: are you re-using an instance name? is that on the global puppetmaster or a project-local one? [16:59:24] I created maps-puppetmaster02.maps-experiments.eqiad1.wikimedia.cloud [16:59:29] !log tools.openstack-browser-dev deploying https://phabricator.wikimedia.org/D1190 [16:59:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.openstack-browser-dev/SAL [16:59:52] and I'm trying to point the other instances in the maps-experiments project to the new puppet-master [17:00:48] an useful information could be: in the puppet-master apache2 was failing because the conf file is being truncated by upstream [17:01:10] I purged and reinstalled apache2 and it worked again [17:02:16] are they pointing to the correct puppetmaster? `grep master /etc/puppet/puppet.conf` [17:02:32] Majavah yes [17:03:16] https://www.irccloud.com/pastebin/uqh2IPKM/ [17:03:56] I can run every command on step 2 https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster#Step_2:_Setup_a_puppet_client [17:04:25] when I run puppet agent for the last time in the client, it errors with certificate revoked [17:05:44] weird, on the puppetmaster I'd try `sudo puppet cert clean full-instance-name` [17:06:23] I did that, and repeat the procedure [17:06:25] same result [17:08:16] hmm, in that case I have no ideas what's causing this :( [17:08:48] !log tools.openstack-browser deploying https://phabricator.wikimedia.org/D1190 [17:08:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.openstack-browser/SAL [17:15:43] !log admin refreshing puppet compiler facts for tools project [17:15:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:46:10] !log tools rebooting tools-sgeexec-* nodes to account for new grid master (T277653) [17:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:46:17] T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653 [18:20:48] job ID 9999796 on tools.svgtranslate appears to be stuck in the deletion state. Can a Toolforge admin kill it pweez? [18:26:50] bstorm: repeating musikanimal request now that you see this channel: "job ID 9999796 on tools.svgtranslate appears to be stuck in the deletion state. Can a Toolforge admin kill it pweez?" [18:27:22] Yup, I was split from the channel before, so I didn't see it [18:27:36] I'll get it...and probably others if I can find em [18:27:55] done [18:27:58] on that one [18:34:51] job 9999796 on tools.svgtranslate still appears stuck on my end. Looks like this might be a widespread issue? I have 4 jobs stuck on tools.musikbot as well [18:45:29] musikanimal: there was a grid outage; you may need to stop/restart things [18:46:34] I see. I tried running `qdel` on these jobs, they are stuck in the `dr` state. When this happened before I needed someone with root to intervene [18:46:55] ok — that might be a bstorm question [18:47:00] musikanimal: I'll take a look in a couple min [18:47:30] okay thanks! for tools.musikbot it's jobs 9999690, 9999770, 9999771 and 9999866 [19:03:40] !log admin deleting all unused (per wmcs-imageusage) Jessie base images from Glance [19:03:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:25:02] !log tools.lexeme-forms deployed 77328e559d (optional forms) [19:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [19:26:54] musikanimal: forcing deletion is working, so I'm killing the jobs now [19:30:02] thanks! [19:30:39] !log tools forced deletion of all jobs stuck in a deleting state T277653 [19:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:30:44] T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653 [19:30:51] I found a bunch of other ones [19:34:39] It looks like all jobs are now running or forced to wait because the user launched more than their limit of jobs (which is fine). [19:36:40] i think 1492646 and 3608591 are stuck as well [19:46:48] Yeah, I see the d status...why is that showing up now and not before? [19:46:55] I had grepped for it hrm [19:46:57] i deleted them [19:47:00] after that [19:47:29] That makes me wonder if deletion isn't working for some reason [19:47:44] i deleted three other jobs, that worked fine [19:47:52] That's good :) [19:48:49] Were they also on continuous queues? [19:49:01] yes, all of them [19:49:12] Ok, well, I force deleted them [19:49:31] great, thanks [19:49:39] Hopefully there isn't some odd pattern lingering or an exec node that is still suffering some kind of issue. [19:49:45] I'll keep an eye out [19:50:20] looks healthy again atm [20:25:54] !log tools.wikibugs Force deleted stuck wb2-grrrrit job 22577 [20:25:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [20:26:15] !logs tools.wikibugs restart gerrit listener [20:27:02] Majavah: "!logs" typo there ^ [20:27:18] bd808: oops, thanks [20:27:43] !log tools.wikibugs restart gerrit listener [20:27:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [20:57:17] !log toolhub Updated to bd808/toohub-beta:search @ 0c778de74c7a [20:57:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolhub/SAL [22:01:01] !log quarry restarting web interface for a small fix for the database field display T264254 [22:01:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [22:01:06] T264254: Prepare Quarry for multiinstance wiki replicas - https://phabricator.wikimedia.org/T264254 [22:02:59] !log quarry restarting celery worker processes to fix connection cleanup T264254 [22:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [22:15:30] !log quarry removing the querykiller role T264254 [22:15:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [22:15:34] T264254: Prepare Quarry for multiinstance wiki replicas - https://phabricator.wikimedia.org/T264254 [22:23:04] wow Python 3 is up to 67% [22:24:07] I'm not sure I believe JavaScript is only 3% though [22:29:34] oh, it was 68% last year, I missed that [23:18:29] :)