[04:11:26] !log shinken Restarted ircecho service on shinken-01.shinken.eqiad.wmflabs [04:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [05:14:36] thanks bd808 I was just about to poke at shinken [05:29:29] madhuvishy: I was too lazy to look and see what it's problem was. I just restarted and looked for it to join channels. [13:35:37] !log tools tools-mail almouked@ltnet.net 719 pending messages cleared [13:35:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:45:39] chasemp: how did you discovered that? ^^^ [13:47:48] arturo: that terrible spammy shinken instance in cloud, we can add you :) [14:08:53] arturo: https://gerrit.wikimedia.org/r/#/c/404446/ :) [14:10:07] great thanks [15:52:27] !log tools depooling tools-exec-1401, 1407, 1408, 1430, 1431, 1432, 1435, 1438, 1439, 1441, tools-webgrid-lighttpd-1402, 1407 for host reboot [15:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:19:30] !log tools repooling tools-exec-1401, 1407, 1408, 1430, 1431, 1432, 1435, 1438, 1439, 1441, tools-webgrid-lighttpd-1402, 1407 after host reboot [16:19:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:08:35] !log tools depooling tools-exec-1405, 1425, tools-webgrid-generic-1403, tools-webgrid-lighttpd-1401, 1405 for host reboot [17:08:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:26:56] !log tools repooling tools-exec-1405, 1425, tools-webgrid-generic-1403, tools-webgrid-lighttpd-1401, 1405 after host reboot [17:27:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:28:43] !log tools disabling tools-webgrid-generic-1402, tools-webgrid-lighttpd-1403, tools-exec-1403 for host reboot [17:28:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:41:53] hello! I am currently unable to access the crontab on any of my tools. Is there some maintenance going on or should I create a phab ticket? [17:42:14] The error I get is `ssh: connect to host tools-cron-01.tools.eqiad.wmflabs port 22: No route to host` [17:42:20] `/usr/local/bin/crontab: unable to execute remote crontab command` [17:42:31] paws just fell over as well (may be unrelated), giving 502's [17:44:49] actually crontab is loading now [17:47:25] musikanimal ebernhardson hi, there's maint going on [17:47:59] ok thanks :) [17:48:36] !log tools depooling tools-exec-1402, 1426, 1429, 1433, tools-webgrid-lighttpd-1408, 1414, 1424 [17:48:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:55:21] uhm, volans says wikibugs isn't logging in -operations and I tried to restart it with `fab deploy:wb2-phab` but that returned an error [17:56:01] paladox: maybe maintenance related? [17:56:10] Yep [17:56:23] Labs is doing reboots for the kernal. [18:04:36] !log tools repooling tools-exec-1402, 1426, 1429, 1433, tools-webgrid-lighttpd-1408, 1414, 1424 [18:04:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:06:26] !log tools depooling tools-exec-1404 and 1434 for host reboot [18:06:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:14:30] https://tools.wmflabs.org/ is currently down fyi andrewbogott [18:15:19] hm [18:15:29] wonder if I hit the proxy by accident? I've been watching for it though [18:15:53] andrewbogott: well it seems to be that tool, not the proxy [18:16:09] hm shouldn't it re-schedule itself? [18:16:10] andrewbogott: huh and now it's rescheduled I think :) [18:16:16] it took a minute [18:16:18] oh good :) [18:16:23] maybe because k8s nodes are so in flux [18:26:09] !log tools repooling tools-exec-1404 and 1434 for host reboot [18:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:31:12] !log tools depooling tools-exec-1422 and tools-webgrid-lighttpd-1413 for host reboot [18:31:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:34:30] !log tools moving proxy from tools-proxy-02 to tools-proxy-01 [18:34:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:17:57] I think the reboot killed wikibugs processes [19:19:06] https://www.mediawiki.org/wiki/Wikibugs [19:19:49] twentyafterfour i guess you can try and restart it now :). [19:20:38] paladox: meh, same error [19:20:43] oh [19:20:58] twentyafterfour what error is it returning? [19:21:18] invalid queue or job "wb2-phab" [19:21:23] oh [19:21:27] Fatal error: sudo() received nonzero return code 1 while executing! [19:22:17] twentyafterfour try fab start_jobs [19:26:01] paladox: not much better luck [19:26:07] oh hmm [19:26:08] Your job 984596 ("wb2-phab") has been submitted [19:26:10] but: [19:26:20] sudo() received nonzero return code 1 while executing '/usr/bin/jsub -N wb2-grrrrit [19:26:22] and [19:26:31] received nonzero return code 1 while executing '/usr/bin/jsub -N wb2-irc [19:26:52] Hmm [19:26:58] oh, I saw wikibugs update in #wikimedia-operations [19:28:10] Hmm [19:28:14] gerrit changes no showing [19:28:26] Is it having a problem connecting to gerrit. [19:29:09] twentyafterfour ^^ [19:29:25] I restarted the gerrit job once more, it didn't error this time [19:29:34] Oh :). [19:29:48] works [19:29:52] thanks twentyafterfour :) [19:30:03] :) [19:30:05] Speaking of Gerrit, is the bot going to be able to handle gerrit's upgrade? [19:30:29] Depends on if the bot expects a string for the change number [19:38:08] tx twentyafterfour [19:46:02] !log depooling tools-webgrid-lighttpd-1427 tools-webgrid-lighttpd-1428 tools-exec-1413 tools-exec-1442 for host reboot [19:46:03] andrewbogott: Unknown project "depooling" [19:48:36] !log tools depooling tools-webgrid-lighttpd-1427 tools-webgrid-lighttpd-1428 tools-exec-1413 tools-exec-1442 for host reboot [20:00:48] !log depooled and repooled tools-webgrid-lighttpd-1427 tools-webgrid-lighttpd-1428 tools-exec-1413 tools-exec-1442 for host reboot [20:00:49] andrewbogott: Unknown project "depooled" [20:00:57] !log tools depooled and repooled tools-webgrid-lighttpd-1427 tools-webgrid-lighttpd-1428 tools-exec-1413 tools-exec-1442 for host reboot [20:01:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:02:23] !log tools depooling tools-exec-1409, 1410, 1414, 1419, 1427, 1428 tools-webgrid-generic-1401, tools-webgrid-lighttpd-1406 [20:02:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:02:48] !log shinken reboot shinken-01 [20:02:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [20:19:16] !log tools repooled tools-exec-1409, 1410, 1414, 1419, 1427, 1428 tools-webgrid-generic-1401, tools-webgrid-lighttpd-1406 [20:19:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:20:07] !log tools depooling tools-webgrid-lighttpd-1412 and tools-exec-1423 for host reboot [20:20:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:33:46] !log tools depooling tools-exec-1406, 1421, 1436, 1437, tools-webgrid-generic-1404, tools-webgrid-lighttpd-1409, tools-webgrid-lighttpd-1411, tools-webgrid-lighttpd-1418, tools-webgrid-lighttpd-1425 [20:33:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:34:11] hi [20:34:18] I guess this is the place to wait for extdist.wmflabs.org to be up? [20:34:36] >Rolling reboots for kernel patches [20:34:40] hmm, *the* patches? [20:35:49] oh, it is already up [20:37:03] lazowik: it's likely the instance that points to was rebooted [20:40:28] Hello, fyi none of the deployment-prep webproxies (like parsoid-beta.wmflabs.org etc) work right now. Judging by the activity in the channel I guess that's expected? [20:43:54] Pchelolo, yeah [20:44:14] ok, cool, just checking. btw right now they're working [20:44:19] the proxy in use for that is not deployment-prep specific though [20:44:39] to be honest it could've been done in a way that avoided knocking out novaproxy [20:46:03] !log tools repooled tools-exec-1406, 1421, 1436, 1437, tools-webgrid-generic-1404, tools-webgrid-lighttpd-1409, tools-webgrid-lighttpd-1411, tools-webgrid-lighttpd-1418, tools-webgrid-lighttpd-1425 [20:46:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:46:28] !log tools depooling tools-exec-1416, 1418, 1424, tools-webgrid-lighttpd-1404, tools-webgrid-lighttpd-1410 [20:46:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:47:25] Krenair: then again proxy probably could use a good restart now and then [20:48:30] Zppix, yeah but it can be done without downtime [20:48:40] there are two instances for this reason [20:49:14] Krenair: true [20:49:44] Krenair: maybe that means it need a failover type script similar to prod [20:50:06] !log shinken Run `chmod o+r /var/log/ircecho/irc-cloud-feed.log` in shinken-01 [20:53:14] madhuvishy: and puppet doesn't mess w/ that? [20:55:22] chasemp: no.. [20:56:36] PAWS looks okay post reboots so far, ping me if y'all run into issues! [20:56:59] !log tools repooling tools-exec-1416, 1418, 1424, tools-webgrid-lighttpd-1404, tools-webgrid-lighttpd-1410 [20:57:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:58:53] !log tools depooling tools-exec-1412, 1415, 1417, tools-webgrid-lighttpd-1415, 1416, 1422, 1426 [20:58:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:04:04] are you aware that the grid master seems down? [21:04:24] Was about to ask if there was a problem, just got a ton of mails from cron [21:04:26] I seem to get an error aswell [21:06:25] there it is again [21:07:29] annika: Wiki13 SQL the grid master was down temporarily for the maintenance in progress [21:07:37] it is hopefully back and I'm running a few tests [21:07:55] ah I see, I was wondering about that if that would have been the case [21:08:15] thx for the status [21:08:40] Thanks! [21:14:11] !log depooling tools-exec-1420 and tools-webgrid-lighttpd-1417 [21:14:12] andrewbogott: Unknown project "depooling" [21:14:27] !log tools depooling tools-exec-1420 and tools-webgrid-lighttpd-1417 [21:14:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:24:55] !log tools repooled tools-exec-1420 and tools-webgrid-lighttpd-1417 [21:25:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:54:38] !log tools tools-exec-1436:~$ /sbin/reboot [21:54:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:01:00] !log tools qstat -explain E -xml | grep 'name' | sed 's///' | sed 's/<\/name>//' | xargs qmod -cq [22:01:03] !log ores deleted ores-redis-01 (seemingly unused) [22:01:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:01:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [22:04:55] Hey! I see a couple of toolforge tools down. Something messed up? [22:05:09] Niharika: they should have restarted [22:05:30] chasemp: https://codesearch.wmflabs.org/search/ andhttps://scholarships.wmflabs.org/apply are down. [22:05:34] or big brother should do its thing but otherwise we had to reboot everything for meltdown fixes [22:05:35] Totally unrelated tools. [22:05:43] I don't think those are in Toolforge are they? [22:05:54] Uhh are they in Labs? [22:05:56] it is probably that the webserver there did not come back post reboot [22:05:57] right [22:06:11] Okay I thought Labs was no longer a thing. Sorry! [22:06:20] s/Cloud VPS/Labs != Toolforge [22:06:31] Got it. [22:06:50] Niharika: afaik things are stable now but some things may not have started automatically post reboot [22:06:55] !log ores created ores-web-02 as a debian stretch instance. [22:06:56] Do those tools have to be manually started? [22:06:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [22:07:43] Niharika: most likely [22:08:32] Niharika: there are services that are up behind that reverse proxy service so I believe it's a few individual projects that probably do not come up gracefully on boot [22:08:39] the owners will have to take a look [22:09:11] chasemp: Okay, thank you. [22:09:35] Niharika: thanks for checking :) [22:10:06] (03Draft2) 10Zppix: Wikimedia-AI host migration/cleanup [labs/icinga2] - 10https://gerrit.wikimedia.org/r/404584 [22:10:23] :) [22:15:45] !log ores pooling ores-web-01/02 [22:15:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [22:23:31] !log ores deleting ores-web-03/05 [22:23:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [22:25:26] !log ores shutting down ores-worker-05/06/07 [22:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [22:27:29] !log ores deleting ores-worker-05/06/07 [22:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [22:33:37] !log ores creating ores-worker-01/02/03/04 as stretch instances. [22:33:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [22:38:20] Niharika: fwiw I looked at teh codesearch project and tried to breath life into it but not sure what's missing atm. I think quiddity is going to drop lego.ktm a note. [22:38:55] chasemp: Okay. For the other instance I just needed to up vagrant and it got on fine. Thanks for your help! [22:40:48] Niharika: or maybe it did work and I was impatient https://codesearch.wmflabs.org/search/ [22:41:18] Ha! It did work! \o/ [22:49:53] !log codesearch restarted everything I could find in the stack :) [22:49:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [23:14:20] !log ores shutting down ores-worker-08/09/10 [23:14:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [23:18:56] !log ores deleting ores-worker-08/09/10 [23:18:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [23:36:47] !log ores created ores-worker-05/06 as Debian Stretch [23:36:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL